In today’s lab, you’ll practice building workflows with recipes, parsnip models, rsample cross validations, and model comparison in the context of timeseries data.
Packages
library(magrittr) # the pipelibrary(tidyverse) # for data wrangling + visualizationlibrary(tidymodels) # for modeling
Warning: package 'modeldata' was built under R version 4.3.3
library(modeltime) # for modeling tslibrary(ggplot2) # for plotting# set the default theme for plottingtheme_set(theme_bw(base_size =18) +theme(legend.position ="top"))
The Data
Today we will be using electricity demand data, based on a paper by James W Taylor:
Taylor, J.W. (2003) Short-term electricity demand forecasting using double seasonal exponential smoothing. Journal of the Operational Research Society, 54, 799-805.
The data can be found in the timetk package as timetk::taylor_30_min, a tibble with demensions: 4,032 x 2
date: A date-time variable in 30-minute increments
value: Electricity demand in Megawatts
data <- timetk::taylor_30_min
Exercise 1: EDA
Plot the data using the functions timetk::plot_time_series, timetk::plot_acf_diagnostics (using 100 lags), and timetk::plot_seasonal_diagnostics.
SOLUTION:
# - plot the datatimetk::taylor_30_min %>% timetk::plot_time_series( date , value , .title ="Short-term electricity demand (30 min)" )
# - plot the acf, pacftimetk::taylor_30_min %>% timetk::plot_acf_diagnostics( date , value , .lags =100 , .title ="Lag Diagnostics - Short-term electricity demand (30 min)")
# - plot the acf, pacftimetk::taylor_30_min %>% timetk::plot_seasonal_diagnostics( date , value , .title ="Seasonal Diagnostics - Short-term electricity demand (30 min)")
Exercise 2: Time scaling
The raw data has 30 minutes intervals between data points. Downscale the data to 60 minute intervals, using timetk::summarise_by_time, revising the electricity demand (value) variable by adding the two 30-minute intervals in each 60-minute interval. Assign the downscaled data to the variable taylor_60_min.
SOLUTION:
# downscale the data (down to a lower frequency of measurement)set.seed(8740)taylor_60_min <- timetk::taylor_30_min %>% timetk::summarise_by_time(.date_var = date , .by ="hour" , value =sum(value) )
Exercise 3: Training and test datasets
Split the new (60 min) time series into training and test sets using timetk::time_series_split
set the training period (‘initial’) to ‘2 months’ and the assessment period to ‘1 weeks’
Prepare the data resample specification with timetk::tk_time_series_cv_plan() and plot it with timetk::plot_time_series_cv_plan
Separate the training and test data sets using rsample.
This is a good place to render, commit, and push changes to your remote lab repo on GitHub. Click the checkbox next to each file in the Git pane to stage the updates you’ve made, write an informative commit message, and push. After you push the changes, the Git pane in RStudio should be empty.
Exercise 7: calibrate
In this exercise we’ll use the testing data with our fitted models.
Create a table with the fitted workflows using modeltime::modeltime_table
Using the table you just created, run a calibration on the test data with the function modeltime::modeltime_calibrate.
Compare the accuracy of the models using the modeltime::modeltime_accuracy() on the results of the calibration
# A tibble: 3 × 9
.model_id .model_desc .type mae mape mase smape rmse rsq
<int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 ETS(M,AD,M) Test 6529. 10.6 2.68 11.0 7222. 0.735
2 2 ARIMA(3,0,0)(2,1,0)[24] Test 8071. 12.7 3.32 13.8 9511. 0.718
3 3 GLMNET Test 3022. 5.25 1.24 5.43 3573. 0.936
It looks like the linear model is the best fit per the rmse metric. This is likely because the data shows the electricity demand is very periodic, and the linear model explicitly includes fourier (periodic) components in the model. The other two models are more general purpose.
Exercise 8: forecast - training data
Use the calibration table with modeltime::modeltime_forecast to graphically compare the fits to the testing data with the observed values.
Now refit the models using the full data set (using the calibration table and modeltime::modeltime_refit). Save the result in the variable refit_tbl.
Use the refit data in the variable refit_tbl, along with modeltime::modeltime_forecast and argument h = ‘2 weeks’ (remember to also set the actual_data argument). This will use the models to forecast electricity demand two weeks into the future.
Plot the forecast with modeltime::plot_modeltime_forecast.